⚡️ Speed up function get_column_tolerance by 13%
#32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 13% (0.13x) speedup for
get_column_toleranceindatacompy/base.py⏱️ Runtime :
1.16 milliseconds→1.03 milliseconds(best of151runs)📝 Explanation and details
The optimization replaces a nested
.get()call with explicitinchecks and direct dictionary access, resulting in a 12% speedup.Key Changes:
tol_dict.get(column, tol_dict.get("default", 0.0))- performs up to two dictionary lookups and method callsif column in tol_dictfollowed by directtol_dict[column]access - eliminates redundant lookups and method call overheadWhy It's Faster:
tol_dict[column]is faster than.get()method callsPerformance Characteristics:
.get()entirelyinchecks instead of one.get()Impact on Workloads:
Based on the function references, this function is called in hot paths within
datacompy.core._intersect_compare()andall_mismatch()- methods that process every column during dataframe comparison operations. Since these methods likely encounter existing columns more frequently than missing ones, the optimization will provide meaningful performance gains in typical data comparison workflows where most columns have explicit tolerance values.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
test_base.py::test_get_column_tolerance_column_is_defaulttest_base.py::test_get_column_tolerance_defaulttest_base.py::test_get_column_tolerance_empty_dicttest_base.py::test_get_column_tolerance_exact_matchtest_base.py::test_get_column_tolerance_no_default🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_teststest_snowflake_py_teststest_polars_py_teststest_sparktest_sql_spark_py_teststest_fuguete__replay_test_0.py::test_datacompy_base_get_column_tolerancetest_pytest_teststest_sparktest_helper_py_teststest_fuguetest_fugue_polars_py_teststest_fuguetest_fugue_p__replay_test_0.py::test_datacompy_base_get_column_tolerance🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_8h8xtkx8/tmpwed2y05m/test_concolic_coverage.py::test_get_column_toleranceTo edit these changes
git checkout codeflash/optimize-get_column_tolerance-mi5v2ai5and push.